accuracy measure

Terms from Artificial Intelligence: humans at the heart of algorithms

Although we use the term accuracy, in day-to-day speech there are many different kinds of accuracy measures depending on the kind of data and application. Often the different measures are controdictory, getting the best accuracy on one metric means sacrificing accuracy on another, including the precision–recall trade-off.
For numeric data the most common measure is root mean square (RMS) in art becaasue ot has nice statsitctical properties, for example linear regression is about finding the line through data that minimsies RMS. RMS is affected particularly stringly by small numbers of extreme values, so average absolute difference may be used instead. If we are interested in worst case scebariosn, the maximum difference may be more useful.
For classifications, even binary chocies, the situation is yet more complex. For binary choices are two main kinds of errors false positives when we assign something to a class (say a disease diagnoosis), but it is actually not in the class and false negatuves when we fail to recognise a true diagnosis. If the probabilty of a false positive is low we have high precision, and if the probability of a false negative is low we have high recall -- whuch we wnat depends on the relative costs of the different kinds of error. These are sometimes combined into a single measure, most commonly the F-score. If we have evidence (say a confidence measure from a machine learning algorithm) and use a threshold to deterimine our decsions, then increasing the threshold means we may have more false negatives whereas reducing it means we have more false posituves. The ROC curve visulaises this trade-off.

Used on pages 180, 196

Also known as accuracy metrics

ROC curve -- trade-off between false positive and false negative rates